249 research outputs found

    Génération modulaire de grammaires formelles

    Get PDF
    The work presented in this thesis aim at facilitating the development of resources for natural language processing. Resources of this type take different forms, because of the existence of several levels of linguistic description (syntax, morphology, semantics, . . . ) and of several formalisms proposed for the description of natural languages at each one of these levels. The formalisms featuring different types of structures, a unique description language is not enough: it is necessary to create a domain specific language (or DSL) for every formalism, and to implement a new tool which uses this language, which is a long a complex task. For this reason, we propose in this thesis a method to assemble in a modular way development frameworks specific to tasks of linguistic resource generation. The frameworks assembled thanks to our method are based on the fundamental concepts of the XMG (eXtensible MetaGrammar) approach, allowing the generation of tree based grammars. The method is based on the assembling of a description language from reusable bricks, and according to a unique specification file. The totality of the processing chain for the DSL is automatically assembled thanks to the same specification. In a first time, we validated this approach by recreating the XMG tool from elementary bricks. Some collaborations with linguists also brought us to assemble compilers allowing the description of morphology and semantics.Les travaux présentés dans cette thèse visent à faciliter le développement de ressources pour le traitement automatique des langues. Les ressources de ce type prennent des formes très diverses, en raison de l’existence de différents niveaux d’étude de la langue (syntaxe, morphologie, sémantique,. . . ) et de différents formalismes proposés pour la description des langues à chacun de ces niveaux. Les formalismes faisant intervenir différents types de structures, un unique langage de description n’est pas suffisant : il est nécessaire pour chaque formalisme de créer un langage dédié (ou DSL), et d’implémenter un nouvel outil utilisant ce langage, ce qui est une tâche longue et complexe. Pour cette raison, nous proposons dans cette thèse une méthode pour assembler modulairement, et adapter, des cadres de développement spécifiques à des tâches de génération de ressources langagières. Les cadres de développement créés sont construits autour des concepts fondamentaux de l’approche XMG (eXtensible MetaGrammar), à savoir disposer d’un langage de description permettant la définition modulaire d’abstractions sur des structures linguistiques, ainsi que leur combinaison non-déterministe (c’est à dire au moyen des opérateurs logiques de conjonction et disjonction). La méthode se base sur l’assemblage d’un langage de description à partir de briques réutilisables, et d’après un fichier unique de spécification. L’intégralité de la chaîne de traitement pour le DSL ainsi défini est assemblée automatiquement d’après cette même spécification. Nous avons dans un premier temps validé cette approche en recréant l’outil XMG à partir de briques élémentaires. Des collaborations avec des linguistes nous ont également amené à assembler des compilateurs permettant la description de la morphologie de l’Ikota (langue bantoue) et de la sémantique (au moyen de la théorie des frames)

    Describing SĂŁo Tomense Using a Tree-Adjoining Meta-Grammar

    Get PDF
    Poster sessionInternational audienceIn this paper, we show how the interactions between the tense, aspect and mood preverbal markers in São Tomense can be formally and concisely described at an abstract level, using the concept of projection. More precisely, we show how to encode the different valid orders of preverbal markers in an abstract description of a Tree-Adjoining Grammar of São Tomense. This description is written using the XMG meta-grammar language (Crabbé and Duchier, 2004)

    Décrire la morphologie des verbes en ikota au moyen d'une métagrammaire

    Get PDF
    Association pour le Traitement Automatique des Langues. This article has been published in the Proceedings of the JEP-TALN-RECITAL 2012 conference. Available on-line at https://www.aclweb.org/anthology/W/W12/W12-1309.pdfNational audienceDans cet article, nous montrons comment le concept des métagrammaires introduit initialement par Candito (1996) pour la conception de grammaires d'arbres adjoints décrivant la syntaxe du français et de l'italien, peut être appliquée à la description de la morphologie de l'ikota, une langue bantoue parlé au Gabon. Ici, nous utilisons l'expressivité du formalisme XMG (eXtensible MetaGrammar) pour décrire les variations morphologiques des verbes en ikota. Cette spécification XMG capture les généralisations entre ces variations. Afin de produire un lexique de formes fléchies, il est possible de compiler la spécification XMG, et de sauvegarder le résultat dans un fichier XML, ce qui permet sa réutilisation dans des applications dédiées

    The Origins of a Rich Absorption Line Complex in a Quasar at Redshift 3.45

    Full text link
    We discuss the nature and origin of a rich complex of narrow absorption lines in the quasar J102325.31+514251.0 at redshift 3.447. We measure nine C IV(\lambda1548,1551) absorption line systems with velocities from -1400 to -6200 km/s, and full widths at half minimum ranging from 16 to 350 km/s. We also detect other absorption lines in these systems, including H I, C III, N V, O VI, and Si IV. Lower ionisation lines are not present, indicating a generally high degree of ionisation in all nine systems. The total hydrogen column densities range from <=10^{17.2} to 10^{19.1}cm^{-2}. We examine several diagnostics to estimate more directly the location and origin of each absorber. Four of the systems can be attributed to a quasar-driven outflow based on line profiles that are smooth and broad compared to thermal line widths. Several systems also have other indicators of a quasar outflow origin, including partial covering. Altogether there is direct evidence for 6 of the 9 systems forming in a quasar outflow. Consistent with a near-quasar origin, eight of the systems have metallicity values or lower limits in the range Z >= 1-8 Z_{sun}. The lowest velocity system, which has an ambiguous location, also has the lowest metallicity, Z <= 0.3 Z_{sun}, and might form in a non-outflow environment farther from the quasar. Overall, however, this complex of narrow absorption lines can be identified with a highly structured, multi-component outflow from the quasar. The high metallicities are similar to those derived for other quasars at similar redshifts and luminosities, and are consistent with evolution scenarios wherein quasars appear after the main episodes of star formation and metal enrichment in the host galaxies.Comment: 16 pages, 12 figures, Accepted to MNRAS, July 201

    Plasmacytoid Dendritic Cell Infection and Sensing Capacity during Pathogenic and Nonpathogenic Simian Immunodeficiency Virus Infection.

    Get PDF
    International audienceHuman immunodeficiency virus (HIV) in humans and simian immunodeficiency virus (SIV) in macaques (MAC) lead to chronic inflammation and AIDS. Natural hosts, such as African green monkeys (AGM) and sooty mangabeys (SM), are protected against SIV-induced chronic inflammation and AIDS. Here, we report that AGM plasmacytoid dendritic cells (pDC) express extremely low levels of CD4, unlike MAC and human pDC. Despite this, AGM pDC efficiently sensed SIVagm, but not heterologous HIV/SIV isolates, indicating a virus-host adaptation. Moreover, both AGM and SM pDC were found to be, in contrast to MAC pDC, predominantly negative for CCR5. Despite such limited CD4 and CCR5 expression, lymphoid tissue pDC were infected to a degree similar to that seen with CD4(+) T cells in both MAC and AGM. Altogether, our finding of efficient pDC infection by SIV in vivo identifies pDC as a potential viral reservoir in lymphoid tissues. We discovered low expression of CD4 on AGM pDC, which did not preclude efficient sensing of host-adapted viruses. Therefore, pDC infection and efficient sensing are not prerequisites for chronic inflammation. The high level of pDC infection by SIVagm suggests that if CCR5 paucity on immune cells is important for nonpathogenesis of natural hosts, it is possibly not due to its role as a coreceptor. The ability of certain key immune cell subsets to resist infection might contribute to the asymptomatic nature of simian immunodeficiency virus (SIV) infection in its natural hosts, such as African green monkeys (AGM) and sooty mangabeys (SM). This relative resistance to infection has been correlated with reduced expression of CD4 and/or CCR5. We show that plasmacytoid dendritic cells (pDC) of natural hosts display reduced CD4 and/or CCR5 expression, unlike macaque pDC. Surprisingly, this did not protect AGM pDC, as infection levels were similar to those found in MAC pDC. Furthermore, we show that AGM pDC did not consistently produce type I interferon (IFN-I) upon heterologous SIVmac/HIV type 1 (HIV-1) encounter, while they sensed autologous SIVagm isolates. Pseudotyping SIVmac/HIV-1 overcame this deficiency, suggesting that reduced uptake of heterologous viral strains underlays this lack of sensing. The distinct IFN-I responses depending on host species and HIV/SIV isolates reveal the host/virus species specificity of pDC sensing

    The Eighth Data Release of the Sloan Digital Sky Survey: First Data from SDSS-III

    Get PDF
    The Sloan Digital Sky Survey (SDSS) started a new phase in August 2008, with new instrumentation and new surveys focused on Galactic structure and chemical evolution, measurements of the baryon oscillation feature in the clustering of galaxies and the quasar Ly alpha forest, and a radial velocity search for planets around ~8000 stars. This paper describes the first data release of SDSS-III (and the eighth counting from the beginning of the SDSS). The release includes five-band imaging of roughly 5200 deg^2 in the Southern Galactic Cap, bringing the total footprint of the SDSS imaging to 14,555 deg^2, or over a third of the Celestial Sphere. All the imaging data have been reprocessed with an improved sky-subtraction algorithm and a final, self-consistent photometric recalibration and flat-field determination. This release also includes all data from the second phase of the Sloan Extension for Galactic Understanding and Evolution (SEGUE-2), consisting of spectroscopy of approximately 118,000 stars at both high and low Galactic latitudes. All the more than half a million stellar spectra obtained with the SDSS spectrograph have been reprocessed through an improved stellar parameters pipeline, which has better determination of metallicity for high metallicity stars.Comment: Astrophysical Journal Supplements, in press (minor updates from submitted version

    Representation and parsing of multiword expressions: Current trends

    Get PDF
    This book consists of contributions related to the definition, representation and parsing of MWEs. These reflect current trends in the representation and processing of MWEs. They cover various categories of MWEs such as verbal, adverbial and nominal MWEs, various linguistic frameworks (e.g. tree-based and unification-based grammars), various languages including English, French, Modern Greek, Hebrew, Norwegian), and various applications (namely MWE detection, parsing, automatic translation) using both symbolic and statistical approaches
    • …
    corecore